Automatic Construction of News Hypertext
نویسندگان
چکیده
Hypertext information retrieval systems combine hypertext and information retrieval capabilities by providing retrieval techniques that include direct search as well as navigation and browsing. The incorporation of a mechanism for the automatic construction of hypertext into an IR system is quite important in order to exploit the advantages of the integration of hypertext with information retrieval. News hypertext deals with articles from newspaper archives. Formal ways to capture the temporal aspects that characterize the newspaper domain, and which are often ignored by IR systems, are presented. The suggested framework is conceptualized with the notions of stories and threads, which are substories within a story. Threads are identified by applying clustering techniques to articles’ segments that correspond to subtopics within the main topic of an article and then linking the segments which belong to the same cluster. The evaluation of such an approach to the automatic construction of hypertext is finally presented, in terms of its usability and the structural quality of the resulting hypertext.
منابع مشابه
NHS: A Tool for the Automatic Construction of News Hypertext
The automatic construction of hypertext is an important part of the hypertext authoring process. This paper presents the NHS system, a system that automatically creates links for news hypertext which is tailored to the domain of newspaper archives. The suggested framework is conceptualized with the notions of stories and threads, which are substories within a story. Threads are identified by ap...
متن کاملAutomatically generating hypertext by computing semantic similarity
We describe a novel method for automatically generating hypertext links within and between newspaper articles. The method is based on lexical chaining, a technique for extracting the sets of related words that occur in texts. Links between the paragraphs of a single article are built by considering the distribution of the lexical chains in that article. Links between articles are built by consi...
متن کاملA Semantic-driven Approach to Hypertextual Authoring
Even if the use of the hypertext paradigm is nowadays very diffused, its potential benefits are not completely exploited by the community of the users. This is particularly evident in the case of the news agencies. The major reasons for the above limitation are the high costs for manually creating and maintaining the sets of complete links of a large-scale hypertext. This is especially true for...
متن کاملAutomatic Extraction of Textual Elements from News Web Pages
In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a supervised machine learning classification technique based on the use of a Support Vector Machine (SVM) classifier to extract the desired textual elements. The technique uses internal structural features of a webpage withou...
متن کاملTACHIR: A Tool for Automatic Construction of Hypertexts for Information Retrieval
The paper describes the design and implementation of TACHIR, a prototype tool for the automatic construction of hypertexts for Information Retrieval. TACHIR builds up automatically an IR hypertext, a hypertext to be used for information retrieval, from a document collection, using a methodology that makes use of a set of well known Information Retrieval techniques. The structure of the IR hyper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997